Skip to main content

How to access data stored in dremio datawarehouse (for data analysts)


Connecting to deakins network

First part of accessing the datawarehouse is connecting to deakins network which requires downloading and connecting to deakins network using anyconnect VPN

Once there select your operating system which will redirect you to a guide explaining how to download, install and connect to deakins network. (If you get a permission error saying you dont have access ensure you are logged in at the top right after the redirection)


Making a SQL request to dremio in jupyter

The first step in ensuring you have the correct packages installed. Required for this guide can be downloaded with this command.

pip install requests pandas

Then import these into your notebook.

import requests
import json
import pandas as pd

After that declare the api url exactly as below.

api_url = "http://10.137.0.149:5001/dremio_query"

Then declare the headers.

headers = {
"Content-Type": "application/json"
}

Then the sql query you wish to query the database with.

sql_query = {
"sql": "SELECT * FROM \"project-3\" \"extended_activities\" LIMIT 10;"
}

As of writing this documentation, users of this api are restricted to using only SELECT queries to prevent malicous use. There is also two usable sources being project-3 and project-2 though the tables within those sources are subject to change in which this documentation will likely be updated with a directory guide.


Then send the post request and store the response.

response = requests.post(api_url, headers=headers, data=json.dumps(sql_query))


Parse the JSON reponse.

result = response.json()

Finally convert it into data frame.

df = pd.DataFrame(result['rows'])

Which you can use just like any other dataframe like:

display(df)